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Abstract 

It was recently demonstrated in II 1311 that the denoising performance of Non-Local Means (NLM) 
can be improved at large noise levels by replacing the mean by the robust Euclidean median. Numerical 

CN ■ 

I , experiments on synthetic and natural images showed that the latter consistently performed better than 

NLM beyond a certain noise level, and significantly so for images with sharp edges. The Euclidean mean 
and median can be put into a common regression (on the patch space) framework, in which the l 2 norm 
q " of the residuals is considered in the former, while the l\ norm is considered in the latter. The natural 

■ question then is what happens if we consider £ p (0 < p < 1) regression? We investigate this possibility 

qq ■ in this paper. 

t— i ; 

i— —i. Index Terms 

> 

Image denoising, non-local means, non-local Euclidean medians, edges, inlier-outlier model, robust- 
, ness, sparsity, non-convex optimization, iteratively reweighted least squares. 

,— i ■ I. Introduction 

>: 

In the last few years, some very effective frameworks for image restoration have been proposed 
. that exploit non-locality (long-distance correlations) in images, and/or use patches instead of pixels 

to robustly compare photometric similarities. The archetype algorithm in this regard is the Non-Local 
t— ( ' Means (NLM) HI . The success of NLM triggered a huge amount of research, leading to state-of-the-art 

algorithms that exploit non-locality and/or the patch model in specialized ways; e.g., see fl3j, Q, (9), 
> '• 0, 10, 0, ©, to name a few. We refer the interested reader to (21, Q for detailed reviews. Of 

^ ■ these, the best performing method till date is perhaps the hybrid BM3D algorithm [9], which effectively 

" combines the NLM framework with other classical algorithms. 

To setup notations, we recall the working of NLM. Let u = (ut ) be some linear indexing of the input 
noisy image. The standard setting is that u is the corrupted version of some clean image / = (/;), 

Ui = fi + aZi, (1) 
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where (zi) is iid A/"(0, 1). The goal is to estimate (approximate) / from the noisy measurement u, possibly 
given a good estimate of the noise floor a. In NLM, the restored image u = («,) is computed using the 
simple formula 



E 



jeS(i) W iJ U J 



(2) 



where toy is some weight (affinity) assigned to pixels i and j. Here S(i) is the neighborhood of pixel 
i over which the averaging is performed. To exploit non-local correlations, S(i) is ideally set to the 
whole image domain. In practice, however, one restricts S(i) to a geometric neighborhood, e.g., to a 
sufficiently large window of size S x S around i m . The other idea in NLM is to set the weights using 
image patches centered around each pixel. In particular, for a given pixel i, let denote the restriction 
of u to a square window around i. Letting k be the length of this window, this associates every pixel i 
with a point P; in R fc2 (the patch space). The weights in standard NLM are set to be 



«-,;. ■■ ■■ <-xp ( -^HPi-Pj-H 2 ). (3) 



where ||Pj — Pj|| is the Euclidean distance between P; and Pj as points in R fc , and h is a smoothing 
parameter. Along with non-locality it is the use of patches that makes NLM more robust in comparison 
to pixel-based neighborhood filters E2]], EH, IfTOH . 

Recently, it was demonstrated in [11311 that the denoising performance of NLM can be improved (often 
substantially for images with sharp edges) by replacing the £ 2 regression in NLM with the more robust 
l-y regression. More precisely, given weights w^j, note that (EJ is equivalent to performing the following 
regression (on the patch space): 

P 4 = arg min ^ w l0 \ | P - P, | j 2 , (4) 

and then setting ui to be the center pixel in P^. Indeed, this reduces to © once we write the regression 
in terms of the center pixel iii. The idea in [11311 was to use l\ regression instead, namely, to compute 

Pi = arg mm ^ w tJ \\P - P 3 -||, (5) 

and then set iii to be the center pixel in P^. Note that ([5]) is a convex optimization, and the minimizer 
(the Euclidean median) is unique when k > 1 II 141 . The resulting estimator was called the Non-Local 
Euclidean Medians (NLEM). A numerical scheme was proposed in [11311 for computing the Euclidean 
median using a sequence of weighted least-squares. It was demonstrated that NLEM performed con- 
sistently better than NLM on a large class of synthetic and natural images, as soon as the noise was 
above a certain threshold. More specifically, it was shown that the bulk of the improvement in NLEM 
came from pixels situated close to edges. An inlier-outlier model of the patch space around an edge 
was proposed, and the improvement was attributed to the robustness of ([5]) in the presence of outliers 

In this paper, we show how a simple extension of the above idea can dramatically improve the 
denoising performance of NLM, and even that of NLEM. This is the content of Section II. In particular, 
a general optimization and algorithmic framework is provided that includes NLM and NLEM as special 
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cases. Some numerical results on synthetic and natural images are provided in Section III to justify our 
claims. Possible extensions of the present work are discussed in Section IV. 

II. Non-Local Patch Regression 

A. Robust patch regression 

It is well-known that l\ minimization is more robust to outliers than £ 2 minimization. A simple 
argument is that the unsquared residuals ||P — Pj|| in ([5]) are better guarded against the aberrant data 
points compared to the squared residuals ||P — Pj|| 2 . The former tends to better suppress the large 
residuals that may result from outliers. This basic principle of robust statistics can be traced back to 
the works of von Neumann, Tukey 111611 . and Huber [17], and lies at the heart of several recent work 
on the design of robust estimators; e.g., see [15], and the references therein. 

A natural question is what happens if we replace the l\ regression in ([5]) by l( v< \\ regression? In 
general, one could consider the following class of problems: 

Pi = argmin ^ Wij ||P - Pj \\ p . (6) 

The intuitive idea here is that, by taking smaller values of p, we can better suppress the residuals ||P — 
Pj|| induced by the outliers. This should make the regression even more robust to outliers, compared 
to what we get with p = 1. We note that a flip side of setting p < 1 is that ([6]) will no longer be convex 
(this is essentially because t h-> \t\ p is convex if and only if p > 1), and it is in general difficult to find 
the global minimizer of a non-convex functional. However, we do have a good chance of finding the 
global optimum if we can initialize the solver close to the global optimum. The purpose of this note is 
to numerically demonstrate that, for all sufficiently large a, the u obtained by solving © (and letting 
u.i to be the center pixel in P s ) results in a more robust approximation of / as p — > 0, than what is 
obtained using NLM. Henceforth, we will refer to ([6]) as Non-Local Patch Regression (NLPR), where p 
is generally allowed to take values in the range (0, 2]. 

B. Iterative solver 

The usefulness of the above idea actually stems from the fact that there exists a simple iterative 
solver for ([6]). In fact, the idea was influenced by the well-known connection between 'sparsity' and 
'robustness', particularly the use of 1{ P< \) minimization for best-basis selection and exact sparse recovery 
II18II . H19II , II22II . We were particularly motivated by the iteratively reweighted least squares (IRLS) 
approach of Daubechies et al. [12111 . and a regularized version of IRLS developed by Chartrand for non- 
convex optimization I1T9H . H20ll . We will adapt the regularized IRLS algorithm in I1T9H . 11201 for solving 
©. The exact working of this iterative solver is as follows. We use the NLM estimate to initialize the 
algorithm, that is, we set 

p( ) = EjgSM WjjPj 

Then, at every iteration k > 1, we write ||P-Pj || p = ||P-Pj -|| 2 - ||P-Pj \\ p ~ 2 in ©, and use the current 
estimate to approximate this by ||P — Pj|| 2 • jjP^" 1 ) — Pj|| p ~ 2 . This gives us the surrogate least-squares 
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problem 

PW =arg min V Wij ^ r ~j^. (8) 

Here e( fe ' > is used as a guard against division by zero, and is gradually shrunk to zero as the iteration 
progresses. We refer the reader to [19] for details. The solution of © is explicitly given by 

where 

M (. fe ) = (||p(fe-l)_p.||2 + £ W) P /2-l. 

The minimizer of © is taken to be the limit of the iterates, assuming that it exists. While we cannot 
provide any guarantee on local convergence at this point, we note that ([9]) can be expressed as a 
gradient descent step (with appropriate step size) of a smooth surrogate of ©. This interpretation 
leads to the well-known Weiszfeld algorithm (for the special case p = 1), which is known to converge 
linearly 112611 . H27H . Alternatively, one could adapt more sophisticated IRLS algorithms (e.g., the one in 
H21I0 . which come with proven guarantees on local convergence, to the case p < 1. 

The overall computational complexity of NLPR is 0(k 2 S 2 I) per pixel, where I is the average number 
of iterations. For NLM, the complexity is 0{k 2 S 2 ) per pixel. For a given convergence accuracy, we have 
noticed that / increases as p decreases. In particular, a large number of iterations are required in the 
non-convex regime < p < 0.4. In this case, we halt the computation after a sufficiently large number 
of iterations. 

Algorithm 1 Non-Local Patch Regression (NLPR) 

Input: Noisy image u = (ui), and parameters h, S, k,p. 

Return: Denoised image u = (iii). 

(1) Extract patch of size k x k at every pixel i. 

(2) For every pixel i, do 

(a) Set Wij = exp(-||Pi - Pj\\ 2 /h 2 ) for every j e S(i). 

(b) Sort Wij,j s S(i), in non-increasing order. 

(c) Let ji, j 2 , ■ • ■ , js 2 De tne re-indexing of j e S(i) as per the above order. 



(d) Find patch P that minimizes Y}t=l( 2] w Ut II p ~ p 

(e) Set Ui to be the center pixel in P. 



C. Robustness using k-nearest neighbors 

We noticed in [11311 that a simple heuristic often provides a remarkable improvement in the perfor- 
mance of NLM. In ©, one considers all patches Pj,j £ S(i), drawn from the geometric neighborhood 
of pixel i. However, notice that when a patch is close to an edge, then roughly half of its neighboring 
patches are on one side (the correct side) of the edge. Following this observation, we consider only 
the top 50% of the the neighboring patches that have the largest weights. That is, the selected patches 
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correspond to the [r/2] -nearest neighbors of in the patch space, where r — \S(i)\. While this tends 
to inhibit the diffusion at low noise levels (in smooth regions), it was demonstrated in [11311 that it can 
significantly improve the robustness of NLM and NLEM at large a. We will also use this heuristic in 
NLPR. The overall scheme is summarized in Algorithm[TJ We use S(i) to denote a window of size S x S 
centered at pixel i in the algorithm. 

III. Numerical Experiments 

To understand the denoising performance of NLPR, we provide some limited results on synthetic and 
natural images. The main theme of our investigation would be to understand how the performance of 
NLPR changes with the regression index p. For a quantitative comparison of the denoising results, we 
will use the standard peak-signal-to-noise ratio (PSNR). For an TV-pixel image, with intensity scaled to 
[0, 1], this is defined to be -101og 10 (e), where e = (1/N) X^iO"* - /i) 2 - 



Fig. 1. Left: Test image Checker. Right: PSNRs obtained using NLPR for the test image Checker at different a and p. To compare 
the different regressions, we skipped steps (b) and (c) in Algorithm^ i.e., we consider all the neighboring patches, and not just 
the top 50%. 

We first consider the test image of Checker used in H13H . This serves as a good model for simultaneously 
testing the denoising quality in smooth regions and in the vicinity of edges. We corrupt Checker as per 
the noise model in ([T|). We then compute the denoised image using Algorithm[T] with the exception that 
we skip steps (b) and (c), that is, we use the full neighborhood S(i). We initialize the iterations of the 
IRLS solver using jT]). For all the experiments in this paper, we fix the parameters to be S = 21, k = 7, 
and h = IOct. These are the settings originally proposed in [1]. The results obtained using these settings 
are not necessarily optimal, and other settings could have been used as well. The point is to fix all the 
parameters in Algorithm [TJ except p. This means that the same are used for different p. We now 
run the above denoising experiment for a = 10, 20, ... , 100, and for p = 0.1, 0.5, 1, 1.5, 2. 

The results are shown in Figure [TJ We notice that, beyond a certain noise level, NLPR performs better 
when p is close to zero. In fact, the PSNR increases gradually from p = 2 to p = 0.1, for a fixed a. At 
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(a) Clean and noisy edge (cr = 0.3). (b) Weights. 



Fig. 2. Ideal edge of length 256 used to evaluate the performance of NLPR. Each patch P m is formed using 3 points around 
position m, i.e., the patch space is of dimension 3 (shown in Figure [3j. The reference patch P; corresponds to the position 
i = 130 (situated close to the edge). The weights in [2b] are computed between the reference patch and the neighboring patches 
P 3 ,j £ [i — 20, i + 20]. 



lower noise levels, the situation reverses completely, and NLPR tends to perform better around p = 2. 
A possible explanation is that the true neighbors in patch space are well identified at low noise levels, 
and since the noise is Gaussian, £ 2 regression gives statistically optimal results. 
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(a) 3d patch space. 
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(b) First 2 coordinates. 
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Fig. 3. 
NLPR 



Inlier-outlier model of the patch space for the reference point marked in Figure [2] Note that the estimate returned by 
jets better as p goes from 2 to 0.1. This is consistent with the results in Figure [T] 



An analysis of the above results shows us that, as p — > 0, the bulk of the improvement comes from 
pixels situated in the vicinity of edges. A similar observation was also made in [11311 for NLEM. To 
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understand this better, we recall the ideal 0-1 edge model used in [13]. This is shown in Figure [2aJ We 
add noise of strength a = 0.3 to the edge, and denoise it using NLPR. We examine the regression at a 
reference point situated just right to the edge (cf. Figure l2b1) . The patch space at this point is specified 
using k = 3 and S — 41. The distribution of patches is shown in Figure |3j Note that the patches are 
clustered around the centers A = (0, 0, 0) and B = (1, 1, 1). For the reference point, the points around 
A are the outliers, while the ones around B are the inliers. We now perform £ p regression on this 
distribution for p = 0.1,1, and 2. The results obtained (Algorithm [T] steps (b) and (c) skipped) from 
a single noise realization are shown in Figure [3j The exact values of the estimate in this case are 0.61 
(p = 2), 0.75 (p = 1), and 0.98 (0.1). The average estimate over 10 noise realizations are 0.58 (p = 2), 
0.82 (p = 1), and 0.95 (p = 0.1). 




We note that the working of the IRLS algorithm provides some insight into the robustness of i v 
regression. Note that when p = 2 (NLM), the reconstruction in ([6]) is linear; the contribution of each 
noisy patch Pj is controlled by the corresponding weight Wij . On the other hand, the reconstruction is 
non-linear when p < 2. The contribution of each Pj is controlled not only by the respective weights, but 
also by the multipliers fj,^ . In particular, the limiting value of the multipliers dictate the contribution 
of each Pj in the final reconstruction. Figure (|4]) gives the distribution of the sorted multipliers (at 
convergence) for the experiment described above. In this case, the large multipliers correspond to the 
inliers, and the small multipliers correspond to the outliers. Notice that when p = 0.5, the tail part of the 
multipliers (outliers) has much smaller values (close to zero) compared to the leading part (inliers) . In 
a sense, the iterative algorithm gradually 'learns' the outliers from the patch distribution as the iteration 
progresses, which are finally taken out of estimation. 
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(c) NLM output. (d) NLPR output (p = 0.1). 



Fig. 5. Denoising results on the 256 x 256 Barbara image obtained using NLM and NLPR. The PSNRs are respectively: (b) 16.11 
dB, (c) 23.53 dB, and (d) 25.39 dB. Notice that the edges and the texture patterns (on the scarf, pants, and table cloth) are 
much better restored in NLPR. 



TABLE I 

Comparison of NLM and NLPR (p = 0.1) at noise levels a = 10, 20, ... , 100 (results averaged over 10 noise 

realizations) 



Image 


Method 










PSNR (dB) 










House 


NLM 
NLPR 


34.25 

33.23 


29.76 
30.23 


26.88 
27.86 


25.21 
26.40 


24.08 
25.45 


23.34 
24.69 


22.81 
24.10 


22.42 
23.52 


22.05 
22.93 


21.80 
22.41 


Barbara 


NLM 
NLPR 


32.38 

31.50 


27.38 
28.42 


24.94 
26.51 


23.53 
25.39 


22.65 
24.57 


22.03 
23.84 


21.62 
23.21 


21.30 
22.60 


21.07 
22.06 


20.87 
21.56 


Boat 


NLM 
NLPR 


30.78 

30.54 


26.71 
27.23 


24.73 
25.50 


23.64 
24.50 


22.95 
23.87 


22.48 
23.40 


22.12 
22.95 


21.88 
22.54 


21.65 
22.11 


21.45 
21.68 


Cameraman 


NLM 
NLPR 


31.39 

31.17 


27.90 

27.46 


24.78 
25.15 


22.93 
25.15 


21.89 
22.68 


21.14 
22.12 


20.62 
21.67 


20.20 
21.36 


19.88 
20.97 


19.61 
20.63 


Peppers 


NLM 
NLPR 


32.34 

31.20 


27.66 
27.67 


24.95 
25.56 


23.13 
24.18 


21.89 
23.03 


21.01 
22.15 


20.43 
21.62 


19.98 
21.13 


19.63 
20.70 


19.40 
20.34 
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IV. Discussion 

We compare the PSNRs obtained using NLPR (p = 0.1) with that of NLM for some standard natural 
images in Table [I] We notice that, for each of the images, NLPR consistently outperforms NLM at large 
noise levels. The gain in PSNR is often as large as 2 dB. The results obtained for Barbara using NLM 
and NLPR are compared in Figure [5] Note that, as expected, robust regression provides a much better 
restoration of the sharp edges in the image than NLM. What is probably surprising is that the restoration 
is superior even in the textured regions. Note, however, that NLM tends to perform better in the smooth 
regions. For example, we some more noise grains in the smooth regions in Figure [5d] compared that 
in Figure [5c] This suggests that an 'adaptive' optimization framework, which combines £ 2 regression 
(in smooth regions) and £( p <\) regression (in the vicinity of edges), might possibly perform better than 
a fixed £ p regression. Some other possible extensions of the present work are as follows: (i) Local 
convergence analysis of the present IRLS algorithm, and ways of improving it; (ii) Possibility of using 
more efficient numerical algorithms for solving ([6]); (iii) Finding better ways of estimating the denoised 
pixel Ui from the estimated patch P, (the projection method used here is probably the simplest); (iv) 
Use of 'better' weights than the ones used in standard NLM (24j, Il25l : and (v) Formulation of a 'joint' 
optimization framework for ©, where the optimization is performed with respect to w,j and P [|6||. 
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